Noise Robustness of Traditional Features for Macedonian Voice Dialing ASR
نویسندگان
چکیده
Automatic Speech Recognition Systems of today are intensely deployed in real world application scenarios which are often characterized by suboptimal operating conditions. Thus their noise robustness has become a crucial parameter when assessing ASR in-field performance. The paper examines the noise robustness of traditional ASR feature sets as applied to a Voice Dialing Application built for Macedonian. The analysis focused on the following features: Linear Prediction Reflection Coefficients, Mel-Cepstral Cepstral Coefficients and Perceptual Linear Prediction Coefficients. The ASR system was trained with clean data, and in the evaluation phase the noise level in the test data was varied by adding white and babble noise. Results have been plotted for each feature type across varying SNR conditions.
منابع مشابه
Techniques for robust speech recognition in the car environment
The use of voice commands or navigation features in the car is becoming a necessity. As keyboard and display interfaces cannot be used safely while driving, much effort has been done to make automatic speech recognition (ASR) and Text-to-Speech synthesis (TTS) ubiquitous features in the car. From voice dialing to car navigation, the requirements for voice technology vary greatly. While the use ...
متن کاملTwo-layered audio-visual integration in voice activity detection and automatic speech recognition for robots
Automatic Speech Recognition (ASR) which plays an important role in human-robot interaction should be noise-robust because robots are expected to work in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve the robustness in such environments. This paper proposes two-layered AV integration for ASR which applies AV integration to Voice Activity Detection (VAD) and...
متن کاملPhone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. This paper proposes noise-robust VAD using long-term temporal information in speech. Long-term temporal information has been an ASR focus recently, but has not been investigated sufficiently for VAD. This paper describes an attempt to incorporate long-term temporal information into a feat...
متن کاملASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness
This paper deals with the analysis of Automatic Speech Recognition (ASR) suitable for usage within noisy environment and suggests optimum configuration under various noisy conditions. The behavior of standard parameterization techniques was analyzed from the viewpoint of robustness against background noise. It was done for Melfrequency cepstral coefficients (MFCC), Perceptual linear predictive ...
متن کاملEvaluation of voice activity detection by combining multiple features with weight adaptation
For noise-robust automatic speech recognition (ASR), we propose a novel voice activity detection (VAD) method based on a combination of multiple features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model (GMM) likelihood. The weights for combination are adaptively updated using minimum...
متن کامل